AutoML: Train "the best" Image Classification Multi-Class model for a 'Fridge items' dataset.

Requirements - In order to benefit from this tutorial, you will need:

Learning Objectives - By the end of this tutorial, you should be able to:

Motivations - This notebook explains how to setup and run an AutoML image classification-multiclass job. This is one of the nine ML-tasks supported by AutoML. Other ML-tasks are 'forecasting', 'classification', 'image object detection', 'nlp text classification', etc.

In this notebook, we go over how you can use AutoML for training an Image Classification Multi-Class model. We will use a small dataset to train the model, demonstrate how you can tune hyperparameters of the model to optimize model performance and deploy the model to use in inference scenarios.

1. Connect to Azure Machine Learning Workspace

The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

1.1. Import the required libraries

1.2. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the MLClient from azure.ai.ml to get a handle to the required Azure Machine Learning workspace. We use the default default azure authentication for this tutorial. Check the configuration notebook for more details on how to configure credentials and connect to a workspace.

2. MLTable with input Training Data

In order to generate models for computer vision tasks with automated machine learning, you need to bring labeled image data as input for model training in the form of an MLTable. You can create an MLTable from labeled training data in JSONL format. If your labeled training data is in a different format (like, pascal VOC or COCO), you can use a conversion script to first convert it to JSONL, and then create an MLTable. Alternatively, you can use Azure Machine Learning's data labeling tool to manually label images, and export the labeled data to use for training your AutoML model.

In this notebook, we use a toy dataset called Fridge Objects, which consists of 134 images of 4 classes of beverage container {can, carton, milk bottle, water bottle} photos taken on different backgrounds.

All images in this notebook are hosted in this repository and are made available under the MIT license.

NOTE: In this PRIVATE PREVIEW we're defining the MLTable in a separate folder and .YAML file. In later versions, you'll be able to do it all in Python APIs.

2.1. Download the Data

We first download and unzip the data locally.

This is a sample image from this dataset:

2.2. Upload the images to Datastore through an AML Data asset (URI Folder)

In order to use the data for training in Azure ML, we upload it to our default Azure Blob Storage of our Azure ML Workspace.

Check this notebook for AML data asset example

2.3. Convert the downloaded data to JSONL

In this example, the fridge object dataset is stored in a directory. There are four different folders inside:

This is the most common data format for multiclass image classification. Each folder title corresponds to the image label for the images contained inside. In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. Please refer to the documentation on how to prepare datasets.

The following script is creating two .jsonl files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file.

2.4. Create MLTable data input

Create MLTable data input using the jsonl files created above.

To create data input from TabularDataset created using V1 sdk, specify the type as AssetTypes.MLTABLE, mode as InputOutputModes.DIRECT and path in the following format azureml:<tabulardataset_name>:<version>.

3. Compute target setup

You will need to provide a Compute Target that will be used for your AutoML model training. AutoML models for image tasks require GPU SKUs such as the ones from the NC, NCv2, NCv3, ND, NDv2 and NCasT4 series. We recommend using the NCsv3-series (with v100 GPUs) for faster training. Using a compute target with a multi-GPU VM SKU will leverage the multiple GPUs to speed up training. Additionally, setting up a compute target with multiple nodes will allow for faster model training by leveraging parallelism, when tuning hyperparameters for your model.

4. Configure and run the AutoML for Images Classification-MultiClass training job

AutoML allows you to easily train models for Image Classification, Object Detection & Instance Segmentation on your image data. You can control the model algorithm to be used, specify hyperparameter values for your model as well as perform a sweep across the hyperparameter space to generate an optimal model.

When using AutoML for image tasks, you need to specify the model algorithms using the model_name parameter. You can either specify a single model or choose to sweep over multiple models. Please refer to the documentation for the list of supported model algorithms.

4.1. Using default hyperparameter values for the specified algorithm

Before doing a large sweep to search for the optimal models and hyperparameters, we recommend trying the default values for a given model to get a first baseline. Next, you can explore multiple hyperparameters for the same model before sweeping over multiple models and their parameters. This allows an iterative approach, as with multiple models and multiple hyperparameters for each (as we showcase in the next section), the search space grows exponentially, and you need more iterations to find optimal configurations.

Following functions are used to configure the AutoML image job:

image_classification() function parameters:

The image_classification() factory function allows user to configure the training job.

set_limits() function parameters:

This is an optional configuration method to configure limits parameters such as timeouts.

set_image_model() function parameters:

This is an optional configuration method to configure fixed settings or parameters that don't change during the parameter space sweep. Some of the key parameters of this function are:

If you wish to use the default hyperparameter values for a given algorithm (say vitb16r224), you can specify the job for your AutoML Image runs as follows:

Submitting an AutoML job for Computer Vision tasks

Once you've configured your job, you can submit it as a job in the workspace in order to train a vision model using your training dataset.

4.2. Hyperparameter sweeping for your AutoML models for computer vision tasks

When using AutoML for Images, we can perform a hyperparameter sweep over a defined parameter space to find the optimal model. In this example, we sweep over the hyperparameters for seresnext, resnet50, vitb16r224, and vits16r224 models, choosing from a range of values for learning_rate, number_of_epochs, layers_to_freeze, etc., to generate a model with the optimal 'accuracy'. If hyperparameter values are not specified, then default values are used for the specified algorithm.

set_sweep function is used to configure the sweep settings:

set_sweep() parameters:

We use Random Sampling to pick samples from this parameter space and try a total of 10 iterations with these different samples, running 2 iterations at a time on our compute target. Please note that the more parameters the space has, the more iterations you need to find optimal models.

We leverage the Bandit early termination policy which will terminate poor performing configs (those that are not within 20% slack of the best performing config), thus significantly saving compute resources.

For more details on model and hyperparameter sweeping, please refer to the documentation.

When doing a hyperparameter sweep, it can be useful to visualize the different configurations that were tried using the HyperDrive UI. You can navigate to this UI by going to the 'Child jobs' tab in the UI of the main automl image job from above, which is the HyperDrive parent run. Then you can go into the 'Trials' tab of this HyperDrive parent run. Alternatively, here below you can see directly the HyperDrive parent run and navigate to its 'Trials' tab:

5. Retrieve the Best Trial (Best Model's trial/run)

Use the MLFLowClient to access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Trial.

Initialize MLFlow Client

The models and artifacts that are produced by AutoML can be accessed via the MLFlow interface. Initialize the MLFlow client here, and set the backend as Azure ML, via. the MLFlow Client.

IMPORTANT, you need to have installed the latest MLFlow packages with:

pip install azureml-mlflow

pip install mlflow

Get the AutoML parent Job

Get the AutoML best child run

Get best model run's metrics

Access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Run.

Download the best model locally

Access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Run.

6. Register best model and deploy

6.1 Create managed online endpoint

6.2 Register best model and deploy

Register model

Deploy

Get endpoint details

Online Inference

Visualize detections

Now that we have scored a test image, we can visualize the bounding boxes for this image.

Generate the Scores and Explanations

Visualize Explanations

Delete the deployment and endopoint

Next Step: Load the best model and try predictions

Loading the models locally assume that you are running the notebook in an environment compatible with the model. The list of dependencies that is expected by the model is specified in the MLFlow model produced by AutoML (in the 'conda.yaml' file within the mlflow-model folder).

Since the AutoML model was trained remotelly in a different environment with different dependencies to your current local conda environment where you are running this notebook, if you want to load the model you have several options:

  1. A recommended way to locally load the model in memory and try predictions is to create a new/clean conda environment with the dependencies specified in the conda.yml file within the MLFlow model's folder, then use MLFlow to load the model and call .predict() as explained in the notebook mlflow-model-local-inference-test.ipynb in this same folder.

  2. You can install all the packages/dependencies specified in conda.yml into your current conda environment you used for using Azure ML SDK and AutoML. MLflow SDK also have a method to install the dependencies in the current environment. However, this option could have risks of package version conflicts depending on what's installed in your current environment.

  3. You can also use: mlflow models serve -m 'xxxxxxx'

Next Steps

You can see further examples of other AutoML tasks such as Regression, Image-Object-Detection, NLP-Text-Classification, Time-Series-Forcasting, etc.